The ISL Baseline Lecture Transcription System for the TED Corpus

نویسندگان

  • Matthias Wölfel
  • Susanne Burger
چکیده

This paper describes the Interactive Systems Laboratories’ automatic lecture transcription system for the Translanguage English Database (TED) corpus, which provides text-hypothesis for the International Workshop on Speech Summarization for Information Extraction and Machine Translation. Furthermore the paper gives a short analysis of speaking style characteristics, in particular addressing native vs. non-native speech. The data for automatic transcription is divided into a native and a nonnative test set. The best word error rate for the native data set is 28.5% while for the non-native set the best result shows 31.0%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The LIMSI RT06s Lecture Transcription System

This paper describes recent research carried out in the context of the FP6 Integrated Project CHIL in developing a system to automatically transcribe lectures and presentations. Widely available corpora were used to train both the acoustic and language models, since only a small amount of CHIL data was available for system development. Acoustic model training made use of the transcribed portion...

متن کامل

The LIMSI RT07 Lecture Transcription System

A system to automatically transcribe lectures and presentations has been developed in the context of the FP6 Integrated Project CHIL. In addition to the seminar data recorded by the CHIL partners, widely available corpora were used to train both the acoustic and language models. Acoustic model training made use of the transcribed portion of the TED corpus of Eurospeech recordings, as well as th...

متن کامل

Language modeling and transcription of the TED corpus lectures

Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and ...

متن کامل

Transcribing lectures and seminars

This paper describes recent research carried out in the context of the FP6 Integrated Project CHIL in developing a system to automatically transcribe lectures and seminars. We made use of widely available corpora to train both the acoustic and language models, since only a small amount of CHIL data were available for system development. For acoustic model training made use of the transcribed po...

متن کامل

Advances in lecture recognition: the ISL RT-06s evaluation system

This paper describes the 2006 lecture recognition system developed at the Interactive Systems Laboratories (ISL), for individual head-microphone (IHM), single distant microphone (SDM), and multiple distant microphones (MDM) conditions. It was evaluated in RT-06S rich transcription meeting evaluation sponsored by the US National Institute of Standards and Technologies (NIST). We describe the pri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005